Integrated Use of Internal and External Evidence in the Alignment of Multi-Word Named Entities

نویسندگان

  • Takeshi Kutsumi
  • Takehiko Yoshimi
  • Katsunori Kotani
  • Ichiko Sata
  • Hitoshi Isahara
چکیده

This paper proposes a method of extracting English multi-word named entities and their Japanese equivalents from a parallel corpus. The aim of our research is to extract multi-word named entities which are not listed in a dictionary of an English-to-Japanese MT system and appear infrequently in a parallel corpus. Our method makes its alignment on the basis of two kinds of external evidence provided by the context in which a bilingual pair appears, as well as two kinds of internal evidence within the pair. Each evidence is accompanied by a score, and the aggregate score is computed as a weighted sum of the scores. The appropriate weights are estimated with the logistic regression analysis. An experiment using a parallel corpus of Yomiuri Shimbun and The Daily Yomiuri satisfactorily found that 86.36% of the extracted bilingual pairs with the highest scores were judged to be correct.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

PAYMA: A Tagged Corpus of Persian Named Entities

The goal in the named entity recognition task is to classify proper nouns of a piece of text into classes such as person, location, and organization. Named entity recognition is an important preprocessing step in many natural language processing tasks such as question-answering and summarization. Although many research studies have been conducted in this area in English and the state-of-the-art...

متن کامل

Named Entity Recognition in Persian Text using Deep Learning

Named entities recognition is a fundamental task in the field of natural language processing. It is also known as a subset of information extraction. The process of recognizing named entities aims at finding proper nouns in the text and classifying them into predetermined classes such as names of people, organizations, and places. In this paper, we propose a named entity recognizer which benefi...

متن کامل

Anatomical Alignment of Lower Extremity in Subjects With Genu Valgum and Genu Varum Deformities

Purpose: Changes in lower extremity alignment in individuals with abnormality of this segments are unclear. The present study aimed to compare lower extremity alignment in subjects with genu valgum and genu varum deformities and healthy subjects  in order to understand the lower extremity alignment changes in this group.  Methods: This was a causal-comparative study. The sample comprised 120 m...

متن کامل

An Integrated Model for the Iranian General Dentistry Curriculum

Background: The purpose of this study is to present an integrated model for the Iranian general dentistry curriculum. Methods: In this study, the qualitative method of focal groups was used. First, using library studies and analysis and interpretation of the resulted information, possible types of integration in the dental curriculum and real experiences of some of the world's leading universit...

متن کامل

سیستم شناسایی و طبقه‌بندی موجودیت‌های اسمی در متون زبان فارسی بر پایه شبکه عصبی

Named Entity Recognition (NER) is a fundamental task in natural language processing and also known as a subset of information extraction. We seek to locate and classify named entities in text into predefined categories such as the names of persons, organizations, locations, expressions of times, etc. Named Entity Recognition for English texts has been researched widely for the past years, howev...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004